The Atari Compendium

home *** CD-ROM | disk | FTP | other *** search

/ The Atari Compendium / The Atari Compendium (Toad Computers) (1994).iso / files / prgtools / langs / pcl-src.zoo / lap.txt < prev next >

Wrap

Text File | 1990-01-25 | 26.3 KB | 656 lines

-*- Mode: Text -*- Copyright (c) 1985, 1986, 1987, 1988, 1989 Xerox Corporation. All rights reserved. Use and copying of this document is permitted. Any distribution of this document must comply with all applicable United States export control laws. Last updated: 6/3/89 by Gregor 10/26/89 by Gregor -- added :RETURN, removed :ISHIFT This file contains documentation of the PCL abstract LAP code. Any port of PCL is required to implement the abstract LAP code interface. There is a portable, relatively good performance implementation in the file lap.lisp, port-specific implementations are in that file as well. The PCL abstract LAP code mechanism exists to provide PCL with a way to create high-performance method lookup functions. Using this mechanism, PCL can produce "LAP closures" which do the method lookup. By allowing PCL to specify these closures using abstract LAP code rather that Lisp code we hope to achieve the following: * Better runtime performance. By using abstract LAP code, we will get better machine instruction sequences than we would from compiling Lisp code. * Better load and update time performance. Because it should be possible to "assemble" the LAP code more quickly than compiling Lisp code, PCL will spend less time building the method lookup code. * Ability to use PCL without a compiler. The LAP assembler will still be required but this should be much smaller than the full lisp compiler. Of course, not all implementations of the LAP code mechanism will satisfy all of these goals. The first is the most important. In particular, many PCL ports will use the portable LAP implementation. KCL will use the portable implementation in all of its ports. Other Lisps may have custom LAP implementations for some ports and use the portable implementation for other ports. Some Lisps will have a custom LAP implementation but will nonetheless require the compiler to be loaded to generate LAP closure constructors. An important point is why we have chosen to take this route rather than have each implementation implement the method lookup codes itself. This was done because we are, at PARC, just beginning to study cache behavior for CLOS programs. As we learn more about this we will want to modify the caching strategy PCL uses. This architecture, because it leaves PCL to implement caching behavior makes it possible to do this. Once this study is complete, implementations may want to do their own, ultra high performance implementations of caching strategies. Production of LAP closures is a two step process. In the first step, a port-specific function is called to take abstract LAP code and produce a a "lap closure generator". Lap closure generators are functions which are called with a set of closure variable values and return a LAP closure. The intermediary of the lap closure generators provides an important optimization. Because it is assumed that producing the LAP closure generator can take much longer than producing a LAP closure from the generator, PCL attempts to make only one closure generator for each sequence of LAP code. Because of the way PCL generates the LAP code sequences, this is quite easy for it to do. The rest of this document is divided into six parts. * the metatypes std-instance and fsc-instance * an abstraction for simple vector indices * important optimizations * the port specific function for making lap closure generators * the actual abstract LAP code * examples *** The metatypes STD-INSTANCE and FSC-INSTANCE *** In PCL, instances with metaclass STANDARD-CLASS are represented using the metatype STD-INSTANCE. (Note that in Cinco de Mayo PCL, this metatype is called IWMC-CLASS.) Each port must implement this metatype. The metatype could be implemented by the following DEFSTRUCT. (defstruct (std-instance (:predicate std-instance-p) (:conc-name %std-instance-) (:constructor %allocate-std-instance (wrapper slots)) (:constructor %allocate-std-instance-1 ()) (:print-function print-std-instance)) (wrapper nil) (slots nil)) PCL itself will guarantee correct access to this structure and the accessors and constructors. With this in mind, the following are important. * Being able to type test this structure quickly is critical. See the :STD-INSTANCE-P opcode. * The allocation functions should compile inline, do no argument checking and be as fast as possible. * The accessor functions should compile inline, do no checking of their arguments and be as fast as possible. SETF of the accessors should do the same. The port is also required to implement the metatype FSC-INSTANCE (called FUNCALLABLE-INSTANCE, or FIN for short, in Cinco de Mayo PCL). Objects with this metatype are used, among other things, to implement generic functions. These objects have field structure associated with them and are also functions that can be applied to arguments. The fields are the same as those for STD-INSTANCE, the FSC-INSTANCE metatype has predicates, print-functions, constructors and accessors as follows: fsc-instance-p print-fsc-instance %fsc-instance-wrapper %fsc-instance-slots %allocate-fsc-instance (wrapper slots) %allocate-fsc-instance-1 () In addition, objects of metatype FSC-INSTANCE have a property called the funcallable instance function. When an FSC-INSTANCE is applied to arguments, the funcallable instance function is what is actually called. The funcallable instance function of an FSC-INSTANCE can be changed using the function SET-FUNCALLABLE-INSTANCE-FUNCTION. There is no mechanism for obtaining the funcallable instance function of an FSC-INSTANCE. It is possible to implement the FSC-INSTANCE metatype in pure Common Lisp. A simple implementation which uses lexical closures as the instances and a hash table to record that the lexical closures are of metatype FSC-INSTANCE is easy to write. Unfortunately, this implementation adds significant overhead: to generic-function-invocation (1 function call) to slot-access (1 function call or one hash table lookup) to class-of a generic-function (1 hash-table lookup) In addition, it would prevent the FSC-INSTANCEs from being garbage collected. In short, the pure Common Lisp implementation really isn't practical. Note that previous implementations of FINS were always based on the lexical closure metatype. In some ports, that provides poor performance. Those ports may want to consider reimplementing to use the compiled code metatype. In that implementation strategy, LAP closure variables would become constants of the compiled code object. The following note from JonL is of interest when working on a FIN implementation: Date: Tue, 16 May 89 05:45:56 PDT From: Jon L White <jonl@lucid.com> This isn't a bug in Lucid's compiler -- it's a lurking bug in PCL that will "bite" most implementations where different settings of the compiler optimization switches will produce morphologically different (but of course functionally equivalent) function objects. The difficulty is in how discriminator codes service cache misses. They "call out" to (potentially) random functions that will in some cases "smash" the function object that was actually running as the discriminator code. This is all right providing you don't return to that function frame, but alas ... I know this is a more extensive problem because the code in the port-independent function 'notice-methods-change' goes out of its way to do a tail-recursive call to the function that is going to smash the possibly-executing discriminator code. Here is the commentary from that code (sic): ;; In order to prevent this we take a simple measure: we just ;; make sure that it doesn't try to reference our its own closure ;; variables after it makes the dcode change. This is done by ;; having notice-methods-change-2 do the work of making the change ;; AND calling the actual generic function (a closure variable) ;; over. This means that at the time the dcode change is made, ;; there is a pointer to the generic function on the stack where ;; it won't be affected by the change to the closure variables. A similar thing should be done in the construction of standard-accessor, checking, and caching dcodes. In an experimental version here at Lucid, I rewrote dcode.lisp to do that, and there is no problem with it. Although that code is somewhat Lucid-specific, it could be of help to someone who wanted to rewrite the generic dcode.lisp (no pun intended). Contact me privately if you are interested. Doing a tail-recursive call out of dcodes when there is a cache miss is a good thing, regardless of other problems. I think one might as well do it. However, I should point out that in the presence of multiprocessing, there is another more serious problem that cannot be solved so simply. Think about what happens when one process decides to update a dcode while another process is still using it; no such stack-maintenance discipline will fix this case. A tail-recursive exit from the dcode will *immensely* reduce the likelihood that another process can sneak in during the interval in which the dcode requires consistency in its function; but it can't reduce that likelihood to zero. The more desirable thing to do is to put the whole "dcode" down one more level of indirection through the symbol-function cell of the generic function. This is effectively what PCL's 'make-trampoline' function does, but unfortunately that is not a very efficient approach when you consider how most compilers will compile it. Something akin to the "mattress-pads" in Steve Haflich's code (in the fin.lisp file) could probably be done for many other implementations as well. *** Index Operations *** Indexes are an abstraction for indexes into a simple vector. This abstraction is used to make it possible to generate more efficient code to access simple vectors. The idea being that this may make it possible to use alternate addressing modes to address these. The "index value" of an index is defined to be the fixnum of which that index is an alternate form. So, using the Lisp function SVREF with the index value of an index accesses the same element as using the index with the appropriate access function or operand. The format of an index is unspecified, but is assumed to be something like a fixnum with certain bits ignored. Accessing a vector using an index must be done using the appropriate special accessor function or opcode. Conversion from index values to indices and vice-versa can be done with the following functions: INDEX-VALUE->INDEX (index-value) INDEX->INDEX-VALUE (index) The following constant indicates the maximum index value an index can have in a given port. This must be at least 2^16. INDEX-VALUE-LIMIT - a fixnum, must be at least 2^16. MAKE-INDEX-MASK (<cache-size> <line-size>) This function is used to make index masks. Because I am lazy, I show an implementation of it in the common case where indexes are just fixnums: (defun make-index-mask (cache-size line-size) (let ((cache-size-in-bits (floor (log cache-size 2))) (line-size-in-bits (floor (log line-size 2))) (mask 0)) (dotimes (i cache-size-in-bits) (setq mask (dpb 1 (byte 1 i) mask))) (dotimes (i line-size-in-bits) (setq mask (dpb 0 (byte 1 i) mask))) mask)) *** Optimizations *** This section discusses two important optimizations related to LAP closures. The first relates to calling LAP closures themselves, the second relates to calling other functions from LAP closures. The important point about calling LAP closures is that almost all of the time, LAP closures will be used as the funcallable-instance-function of funcallable instances. It is required that LAP closures be funcallable themselves, but usually they will be stored in a FIN and the fin will then be funcalled. This brings up several optimizations, including ones having to do with access to the closure variables of a LAP closure. When a LAP closure is used to do method lookup, the function the LAP closure ends up calling has the same number of required arguments as the LAP closure itself. Since the LAP closure must check its required arguments to do the lookup, it is redundant for the function called to do so as well. Since LAP closures do all calls in a tail recursive way, it should even be possible to optimize out certain parts of the normal stack frame initialization. A similar situation occurs between effective method functions and the individual method functions; the difference is that in effective method functions, the calls are not necessarily tail recursive. Consequently, it would be nice to have a way to call certain functions and inhibit the checking of required arguments. This is made possible by use of the PCL-FAST-APPLY and PCL-FAST-FUNCALL macros together with the PCL-FAST-CALL compiler declaration. The PCL-FAST-CALL compiler declaration declares that a function may be fast called. Not all callers of the function will necessarily fast call it, but most probably will. The :JMP opcode can only be used to call a function compiled with the PCL-FAST-CALL declaration. The PCL-FAST-APPLY and PCL-FAST-FUNCALL macros are used to fast call a function. The function argument must be a compiled function that has the PCL-FAST-CALL compiler declaration in its lambda declarations. The basic idea is that the PCL-FAST-CALL compiler declaration causes the compiler to set up an additional entrypoint to the function. This entrypoint comes after checking of required arguments but before processing of other arguments. Note: When FAST-APPLY is used, the required arguments will be given as separate arguments and all other arguments will appear as a single spread argument. For example: (let ((fn (compile () '(lambda (a b &optional (c 'z)) (declare (pcl-fast-call)) (list a b c))))) (pcl-fast-apply fn 'x 'y ()) ;legal (pcl-fast-apply fn 'x 'y '(foo)) ;legal (pcl-fast-apply fn '(a b c)) ;illegal ) *** Producing LAP Closure Generators *** Each implementation of the LAP code mechanism must provide a port specific function making lap closure generators. In the portable implementation, this function is called PLAP-CLOSURE-GENERATOR. In ExCL it should be called EXCL-LAP-CLOSURE-GENERATOR etc. At any time, the value of the variable *make-lap-closure-generator* is a symbol which names the function currently being used to make lap closure generators. The port specific function must accept arguments as follows: PLAP-CLOSURE-GENERATOR (<closure-vars> <args> <index-registers> <simple-vector-registers> <lap-code>) This returns a lap-closure generator. A lap-closure generator is a function which is called with a number of arguments equal to the length of <closure-vars>. These arguments are the values of the closure variables for the lap closure. These values cannot be changed once the LAP closure is created. PCL takes care of keeping track of lap-closure-generators it already has on hand and reusing them. The function RESET-LAP-CLOSURE-GENERATORS can be called to force PCL to forget all the lap closure generators it has remembered. <args> A list of symbols. This provides a way to name particular arguments to the LAP closure. Arguments which will not be referenced by name are given as NIL. All required arguments to the LAP closure are explicitly included (perhaps as NIL). If &REST appears at the end of arguments it means that non-required arguments are allowed, these will be processed by the methods. If &REST does not appear at the end of arguments, the lap closure should signal an error if more than the indicated number of arguments are supplied. Examples: - (obj-0 obj-1) Specifies a two argument lap closure. If more or less than two arguments are supplied an error is signalled. Within the actual lap code, both arguments can be referenced by name (see the :ARG operand). - (obj-0 nil &rest) Specifies a two or more argument lap closure. If less than two arguments are supplied an error is signalled. Within the actual lap code, the first argument can be referenced by name (see the :ARG operand). <closure-vars> A list of symbols. The closure will have these as closure variables. Within the lap code these can be accessed using the :CVAR operand. The lap code cannot change these values. SET-FUNCALLABLE-INSTANCE-FUNCTION is permitted to have the special knowledge that there are at most ?? of these and to be prepared to do something special when the funcallable instance function of a funcallable instance is set to a lap closure. <index-registers> A list of register numbers. These registers will be used only to hold indexes. Other registers may be used to hold indexes as well, but the only values put into these registers will be indexes. <simple-vector-registers> A list of register numbers. These registers will be used only to hold simple-vectors. Other registers may be used to hold simple-vectors as well, but the only values put into these registers will be simple-vectors. <lap-code> The actual lap code for this closure. This is a list of LAP code opcodes. See the section "Abstract LAP Code" for more details. Each implementation must also supply a function named PRE-MAKE-xxx where xxx is the same as the name of its make-lap-closure-generator function. The macro doesn't evaluate its arguments, and when it appears in a file it should try to do some of the work at load time. It might appear in a file like this: (eval-when (load) (setq 1-arg-std-lap (pre-make-plap-closure-generator ...))) *** Abstract LAP Code *** Each lap code operand has the form: (opcode operand1 ... operandn). In some cases, the distinction between an operand and an opcode is somewhat arbitrary. In general, opcodes have a significant "action" component to their behavior. Operands select a piece of data to operate on. Some operands select their data in a more complex way, but they are operands anyways. All data must be in a register before it can be operated on. This requirement means that the only place a non-register operand can appear is as the first argument to the :move opcode. (Actually, there is one other exception, a :iref operand can be the target of a move as well.) Moreover, only register operands can appear as the second argument to the :move opcode and this register must not appear in the <from> operand. >> The operands are: (:reg <n>) A pseudo register. <n> is an integer in the range [0 , 31]. A particular implementation can map this to a real register, a memory location or the stack. The abstract LAP code itself does not include the notion of a stack. PCL will attempt to optimize register use in two ways. PCL itself will attempt to re-use registers whenever possible. That is, the port should not have to worry with doing live register analysis for the registers. In addition, PCL will consider lower numbered registers to be "faster" than higher numbered ones. (:cvar <name>) A closure variable of the lap-closure. <name> is a symbol. (:arg <name>) An argument to the LAP closure. <name> is a symbol. (:std-wrapper <from>) (:fsc-wrapper <from>) (:built-in-wrapper <from>) (:structure-wrapper <from>) (:other-wrapper <from>) Get the class wrapper of <from>. For std-instances and fsc-instances this just fetches the wrapper field. The specific port is required to implement fast access to the wrappers of built-in, structure and other metatypes. A callback mechanism allows the port to ask PCL to generate a class and wrapper for objects for which no class and wrapper exists yet. This mechanism is <<to be spec'd out>>. (:std-slots <operand>) (:fsc-slots <operand>) Fetch the slots field of a std-instance or a fsc-instance. (:constant <constant>) This just allows inline constants. <constant> can be any Lisp object. The following operands operate on indexes. Each is patterned after a Lisp function which would have a corresponding effect on the index value of the index. (:i1+ <index>) (:i+ <index-1> <index-2>) (:i- <index-1> <index-2>) (:ilogand <index-1> <index-2>) (:ilogxor <index-1> <index-2>) Like the corresponding Lisp functions. (:iref <vector> <index>) Like the SVREF function. <vector> must be a simple vector. (:cref <vector> <constant>) The :cref operand is for constant vector references. <constant> must be a fixnum. >> The opcodes are: (:move <from> <to>) A full word move operation. (:eq <from1> <from2> <label>) (:neq <from1> <from2> <label>) EQ and NOT EQ conditional branches. If the value contained in <from1> is EQ (or not) to the value contained in <from2>, jump to <label>. Otherwise continue execution at the next opcode. <label> is a symbol. (:fix= <from1> <from2> <label>) A fixnum = conditional branch. (:izerop <from> <label>) Jump to label if and only if the index <from> is 0. (:std-instance-p <from> <destination-label>) (:fsc-instance-p <from> <destination-label>) (:built-in-instance-p <from> <destination-label>) (:structure-instance-p <from> <destination-label>) Test the metatype of <from> and dispatch. If the metatype of from is of the specified type execution jumps to <destination-label>. Otherwise, execution proceeds normally at the next opcode. See the :xxx-wrapper operands. (:jmp <function>) fcn contains a function to call. This must be a compiled function, which had the PCL-FAST-CALL declaration in it. The call should be "tail recursive" in that whatever values it returns should be returned immediately from the LAP closure itself. Method lookup is implemented by finding a function in the cache and then using :JMP to call it. The various kinds of traps are implemented by using :JMP to call a trap function that was stored in one of the closure variables. (:return <value>) Return immediately from the LAP closure. <value> is the single value to return. In certain cases of method lookup when all the methods are accessor methods, there is an important optimization which implements the slot access in the discriminating function itself. This opcode is used in that case. (:label <label>) Identifies a point in the lap code. <label> is a symbol. This can be the target of any of the control transfer opcodes (:GO, :EQ, :NEQ, :FIX=, :STD-INSTANCE-P, :FSC-INSTANCE-P, :STRUCTURE-INSTANCE-P, :BUILT-IN-INSTANCE-P) (:go <label>) Jump to the label <label>. <label> is a symbol. *** Examples *** Here is an example of the use of the abstract LAP mechanism. It doesn't use all operands or opcodes, but it is representative of the LAP sequences PCL will use. Several things are worth noting: * This is, I believe, just about the simplest such sequence. There are some others of comparable simplicity, but none much simpler. * A total of only 5 registers are used. I haven't checked, but I am pretty sure most all such code sequences will get by with 16 or less and many will stay under 8. (defun make-1-arg-std-dfun () (let ((cg (making-lap-closure-generator (initialize-lap-cvars '(cache mask reflect trap)) (initialize-lap-args '(a0)) (initialize-lap-registers 4 4 3) (let ((cache (allocate-lap-register 'simple-vector)) (count (allocate-lap-register)) (wrapper (allocate-lap-register)) (index (allocate-lap-register 'index)) (t1 (allocate-lap-register)) (t2 (allocate-lap-register)) (i1 (allocate-lap-register 'index)) (i2 (allocate-lap-register 'index))) (opcode :move (operand :cvar 'cache) cache) (opcode :move (operand :arg 'a0) t1) (opcode :std-instance-p t1 'std-instance) (opcode :go 'trap) (opcode :label 'std-instance) ;; ;; The stable register pattern for the rest of the code is: ;; cache Cache ;; count Cache count ;; wrapper Wrapper ;; index index under consideration ;; ;; Whenever we jump to HIT, this pattern must be in force. ;; (opcode :move (operand :std-wrapper t1) wrapper); (opcode :move (operand :cref cache 0) count) ; (opcode :move (operand :cref wrapper 0) i2) ;wrapper-no -> i2 (opcode :move (operand :cvar 'mask) i1) ;mask -> i1 (opcode :move (operand :ilogand i1 i2) index) ; ; (opcode :move (operand :iref cache index) t1) ;key -> t1 (opcode :eq t1 wrapper 'hit) ; (opcode :move (operand :cvar 'reflect) i1) ;reflect -> i1 (opcode :move index i2) ;index -> i2 (opcode :move (operand :i- i1 i2) index) ; (opcode :move (operand :iref cache index) t1) ;key -> t1 (opcode :eq t1 wrapper 'hit) ; (opcode :go 'trap) ;; ;; To do the hit, we use registers as follows: ;; 0 cache comes in here ;; 1 cache count comes in here ;; 2 use this for the function ;; 3 index comes in here ;; 4 1+ index, then new count ;; (opcode :label 'hit) (opcode :move (operand :i1+ index) i1) (opcode :move (operand :iref cache i1) t1) (opcode :move (operand :cref cache 0) t2) (opcode :fix= count t2 'call) (opcode :go 'trap) (opcode :label 'call) (opcode :jmp t1) (opcode :label 'trap) (opcode :move (operand :cvar 'trap) t1) (opcode :jmp t1))))) (funcall cg (make-array 16) (make-index-mask 16 2) (- 16 2) #'(lambda (a) (declare (pcl-fast-call) (ignore a)) (break "Trap")))))